92 research outputs found
SiZer for time series: A new approach to the analysis of trends
Smoothing methods and SiZer are a useful statistical tool for discovering
statistically significant structure in data. Based on scale space ideas
originally developed in the computer vision literature, SiZer (SIgnificant ZERo
crossing of the derivatives) is a graphical device to assess which observed
features are `really there' and which are just spurious sampling artifacts. In
this paper, we develop SiZer like ideas in time series analysis to address the
important issue of significance of trends. This is not a straightforward
extension, since one data set does not contain the information needed to
distinguish `trend' from `dependence'. A new visualization is proposed, which
shows the statistician the range of trade-offs that are available. Simulation
and real data results illustrate the effectiveness of the method.Comment: Published at http://dx.doi.org/10.1214/07-EJS006 in the Electronic
Journal of Statistics (http://www.i-journals.org/ejs/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Analysis of dependence among size, rate and duration in internet flows
In this paper we examine rigorously the evidence for dependence among data
size, transfer rate and duration in Internet flows. We emphasize two
statistical approaches for studying dependence, including Pearson's correlation
coefficient and the extremal dependence analysis method. We apply these methods
to large data sets of packet traces from three networks. Our major results show
that Pearson's correlation coefficients between size and duration are much
smaller than one might expect. We also find that correlation coefficients
between size and rate are generally small and can be strongly affected by
applying thresholds to size or duration. Based on Transmission Control Protocol
connection startup mechanisms, we argue that thresholds on size should be more
useful than thresholds on duration in the analysis of correlations. Using
extremal dependence analysis, we draw a similar conclusion, finding remarkable
independence for extremal values of size and rate.Comment: Published in at http://dx.doi.org/10.1214/09-AOAS268 the Annals of
Applied Statistics (http://www.imstat.org/aoas/) by the Institute of
Mathematical Statistics (http://www.imstat.org
Support vector machines with adaptive penalty
The standard Support Vector Machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions (Bradley and Mangasarian, 1998; Zhu et al., 2003). These learning methods are non-adaptive since their penalty forms are pre-determined before looking at data, and they often perform well only in a certain type of situation. For instance, the L2 SVM generally works well except when there are too many noise inputs, while the L1 SVM is more preferred in the presence of many noise variables. In this article we propose and explore an adaptive learning procedure called the Lq SVM, where the best q > 0 is automatically chosen by data. Both two- and multi-class classification problems are considered. We show that the new adaptive approach combines the benefit of a class of non-adaptive procedures and gives the best performance of this class across a variety of situations. Moreover, we observe that the proposed Lq penalty is more robust to noise variables than the L1 and L2 penalties. An iterative algorithm is suggested to solve the Lq SVM efficiently. Simulations and real data applications support the effectiveness of the proposed procedure
Multiscale Exploratory Analysis of Regression Quantiles Using Quantile SiZer
The SiZer methodology proposed by Chaudhuri & Marron (1999) is a valuable tool for conducting exploratory data analysis. Since its inception different versions of SiZer have been proposed in the literature. Most of these SiZer variants are targeting the mean structure of the data, and are incapable of providing any information about the quantile composition of the data. To fill this need, this article proposes a quantile version of SiZer for the regression setting. By inspecting the SiZer maps produced by this new SiZer, real quantile structures hidden in a data set can be more effectively revealed, while at the same time spurious features can be filtered out. The utility of this quantile SiZer is illustrated via applications to both real data and simulated examples
Support vector machines with adaptive Lq penalty
The standard support vector machine (SVM) minimizes the hinge loss function subject to the L2 penalty or the roughness penalty. Recently, the L1 SVM was suggested for variable selection by producing sparse solutions [Bradley, P., Mangasarian, O., 1998
Visualization and inference based on wavelet coefficients, SiZer and SiNos
SiZer (SIgnificant ZERo crossing of the derivatives) and SiNos (SIgnificant NOnStationarities) are scale-space based visualization tools for statistical inference. They are used to discover meaningful structure in data through exploratory analysis involving statistical smoothing techniques. Wavelet methods have been successfully used to analyze various types of time series. In this paper, we propose a new time series analysis approach, which combines the wavelet analysis with the visualization tools SiZer and SiNos. We use certain functions of wavelet coefficients at different scales as inputs, and then apply SiZer or SiNos to highlight potential non-stationarities. We show that this new methodology can reveal hidden local non-stationary behavior of time series, that are otherwise difficult to detect
Long-range dependence in a changing Internet traffic mix
This paper provides a deep analysis of long-range dependence in a continually evolving Internet traffic mix by employing a number of recently developed statistical methods. Our study considers time-of-day, day-of-week, and cross-year variations in the traffic on an Internet link. Surprisingly large and consistent differences in the packet-count time series were observed between data from 2002 and 2003. A careful examination, based on stratifying the data according to protocol, revealed that the large difference was driven by a single UDP application that was not present in 2002. Another result was that the observed large differences between the two years showed up only in packet-count time series, and not in byte counts (while conventional wisdom suggests that these should be similar). We also found and analyzed several of the time series that exhibited more “bursty” characteristics than could be modeled as Fractional Gaussian Noise. The paper also shows how modern statistical tools can be used to study long-range dependence and non-stationarity in Internet traffic data
Dependent SiZer: Goodness-of-Fit Tests for Time Series Models
In this paper, we extend SiZer (SIgnificant ZERo crossing of the derivatives) to dependent data for the purpose of goodness of fit tests for time series models. Dependent SiZer compares the observed data with a specific null model being tested by adjusting the statistical inference using an assumed autocovariance function. This new approach uses a SiZer type visualization to flag statistically significant differences between the data and a given null model. The power of this approach is demonstrated through some examples of time series of Internet traffic data. It is seen that such time series can have even more burstiness than is predicted by the popular, long range dependent, Fractional Gaussian Noise model
Experimental Investigation of the Effects of Concrete Alkalinity on Tensile Properties of Preheated Structural GFRP Rebar
The combined effects of preexposure to high temperature and alkalinity on the tensile performance of structural GFRP reinforcing bars are experimentally investigated. A total of 105 GFRP bar specimens are preexposed to high temperature between 120°C and 200°C and then immersed into pH of 12.6 alkaline solution for 100, 300, and 660 days. From the test results, the elastic modulus obtained at 300 immersion days is almost the same as those of 660 immersion days. For all alkali immersion days considered in the test, the preheated specimens provide slightly lower elastic modulus than the unpreheated specimens, showing only 8% maximum difference. The tensile strength decreases for all testing cases as the increase of the alkaline immersing time, regardless of the prehearing levels. The tensile strength of the preheated specimens is about 90% of the unpreheated specimen for 300 alkali immersion days. However, after 300 alkali immersion days the tensile strengths are almost identical to each other. Such results indicate that the tensile strength and elastic modulus of the structural GFRP reinforcing bars are closely related to alkali immersion days, not much related to the preheating levels. The specimens show a typical tensile failure around the preheated location
- …